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ABSTRACT 


As  companies  seek  proteetion  from  cyber  attacks,  justifying  proper  levels  of  investment 
in  eyber  seeurity  is  essential.  Like  all  investments,  eyber  defense  costs  must  be  weighed 
against  their  expeeted  benefits. 

While  some  eyber  investment  models  exist  that  ean  relate  costs  and  benefits,  these 
models  are  largely  untested  with  experimental  data.  This  researeh  develops  an 
experimental  framework  and  statisties  for  testing  and  measuring  the  effieacy  of  eyber 
mitigation  methods,  sueh  that  they  ean  be  integrated  into  existing  cyber  investment 
models. 

This  work  surveys  cyber  security  investment  models  and  frameworks.  Using 
eyber  exereises  as  a  souree  of  attack  data,  types  of  exereises  and  how  information  is 
reeorded  was  studied. 

A  proof  of  eoneept  for  an  experimental  framework  able  to  reeord  statisties  on 
eyber  exercise  attaeks  and  defenses  was  developed.  The  environment  is  intended  to 
resemble  that  of  an  actual  cyber  attaek,  and  to  eollect  attack  and  defense  data  in  a 
repeatable  and  teehnology-agnostie  manner.  Possible  future  work  eould  illuminate 
mathematical  relationships  between  threat  and  mitigation. 

Statisties  and  proeedures  are  proposed  that  are  applieable  to  the  speeifie  proposed 
and  similar  frameworks.  Sueh  statistics  could  be  ineorporated  into  cyber  models, 
ultimately  leading  to  a  more  rational  understanding  of  eyber  attaek  and  defense. 
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I.  INTRODUCTION 


A,  BACKGROUND 

Many  senior  officials  in  the  U.S.  Government  and  private  sector  contend  that  the 
sustained,  strategic,  and  surreptitious  theft  of  America’s  economic  innovation  could 
currently  pose  an  existential  threat  to  the  economy  and  security  of  the  United  States 
(Rogers,  2011).  As  put  by  General  Keith  Alexander,  the  aggregate  theft  of  American 
Intellectual  property  constitutes  the  “greatest  transfer  of  wealth  in  history”  (Rogin,  2012). 
While  physical  property  is  relatively  secure,  and  the  security  of  financial  data  has  been 
made  paramount,  the  intellectual  property  that  fuels  America’s  economy  remains  highly 
vulnerable.  In  order  to  understand  this  current  state  of  cyber  defense,  researchers  must 
examine  the  decisions  that  network  defenders  make,  and  the  information  necessary  to 
make  those  decisions  in  an  informed  manner. 

In  order  for  senior  leadership  to  make  informed  decisions,  cyber  defenders  sorely 
need  quantitative  models  and  statistics.  Real-world  intrusion  data  is  not  readily  available, 
nor  will  it  become  readily  available  in  the  near  term.  However,  cyber  defense  exercises 
are  free  of  much  of  the  legal  and  classification  restrictions  of  real  intrusions,  and  can 
provide  valuable  data  in  a  free,  public,  and  standardized  manner.  The  offensive  and 
defensive  tactics  used  during  cyber  exercises  also  mirror  some  of  the  procedures  used  in 
real  world  attacks.  Therefore,  exercises  can  serve  as  a  risk-free  proxy  for  data  gathering; 
that  data  can  then  be  used  to  feed  cyber  defense  models,  inform  senior  leadership 
decision-making,  and  further  the  understanding  of  the  cybersecurity  ecosystem. 

B,  MOTIVATION 

In  “Why  Information  Security  is  Hard,”  Ross  Anderson  outlines  some  of  the 
possible  reasons  for  the  current  “state”  of  cybersecurity  defense.  He  argues  that  the 
motivations,  liabilities,  and  responsibilities  of  parties  in  cyberspace  largely  determine 
which  actions  are  taken,  and  by  whom,  to  secure  their  networks.  He  points  to  “The 
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Tragedy  of  the  Commons”  as  an  allegorieal  example  of  the  issue  of  eolleetive  aetion  (or 
laek  thereof)  in  eyberseeurity: 

If  a  hundred  peasants  graze  their  sheep  on  the  village  eommon,  then 
whenever  another  sheep  is  added  its  owner  gets  almost  the  full  benefit, 
while  the  other  ninety-nine  suffer  only  a  small  deeline  in  the  quality  of  the 
grazing.  So  they  aren’t  motivated  to  objeet,  but  rather  to  add  another  sheep 
of  their  own  and  get  as  mueh  of  the  grazing  as  they  ean.  The  result  is  a 
dustbowl;  and  the  solution  is  regulatory  rather  than  teehnieal.  (Anderson, 

2001) 

The  Tragedy  of  the  Commons  highlights  the  aggregated  negative  externalities  of 
single  aetors’  self-interested  aetions.  In  eyberseeurity,  the  issue  is  the  same,  but  inverted; 
the  laek  of  aetion  on  the  part  of  a  small  fraetion  of  eompanies  to  seeure  their  networks 
makes  everyone  else  more  vulnerable.  Citing  Hal  Varian’s  2000  NYTimes  artiele, 
“Managing  Online  Seeurity  Risks”  (Varian,  2000),  Anderson  argues  that  eosts  should  be 
pushed  to  those  “hosting”  the  malieious  aetivity,  so  to  speak.  In  the  NYTimes,  Varian 
puts  it  simply:  “One  reason  that  eomputer  seeurity  is  so  poor  in  praetiee  is  that  the 
liability  is  so  diffuse.” 

Adding  to  the  discussion  of  forces  at  play  in  cyberspace,  Andrew  Oldyzko 
examines  the  “Economics,  Psychology,  and  Sociology  of  Security”  in  his  paper  by  the 
same  name  (Odlyzko,  2003).  He  touches  on  some  of  the  economic  motivations  cited  by 
Anderson,  writing  “security  does  not  come  for  free,  and  so  it  is  necessary  to  look  at  the 
tradeoffs  between  costs  and  benefits.  Furthermore,  it  is  necessary  to  look  at  the  incentives 
of  various  players,  as  many  have  an  interest  in  passing  on  the  costs  of  security  to  others, 
or  of  using  security  for  purposes  such  as  protecting  monopolies.”  This  highlights  the 
importance  of  organizational  and  human  interests  behind  decisions  in  the  cyber  defense 
realm.  Furthermore,  he  believes  the  formal  structures  are,  on  paper,  incomplete,  writing: 
“Our  commercial,  government,  and  academic  enterprises  are  large  organizations  with 
many  formal  rules  and  regulations.  Yet  the  essential  workings  of  these  enterprises  are 
typically  based  on  various  social  relations  and  unwritten  rules.”  While  technology  and 
systems  may  have  certain  expected  behaviors  in  a  vacuum,  the  real  performance  of  these 
technologies  is  at  the  mercy  of  the  humans  operating  them,  and  by  extension,  their  own 
cost-benefit  decisions. 
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Information  Sharing 

In  “Cyber  Information-Sharing  Models:  An  Overview”  (The  MITRE  Corporation, 
2012),  researehers  at  MITRE  argue  that  the  eurrent  state  of  affairs  in  information  sharing 
is  a  contributing  factor  to  the  poor  state  of  cyber  defense.  They  write  that  “a  key  element 
in  defending  against  [cyber]  attacks  is  having  information  about  the  tools,  techniques  and 
resources  (physical,  financial,  and  human)  that  adversaries  are  using  to  breach  cyber 
defenses.”  The  argument  here  is  that  even  with  the  right  motivations,  defense 
technologies,  and  security  procedures  in  place,  many  organizations  may  simply  lack  the 
information  they  need  to  defend  themselves. 

These  examples  illustrate  that  there  is  much  more  to  the  cybersecurity  problem 
that  just  poor  cryptography  or  out-of-date  AV  signatures.  There  are  large-scale  economic, 
sociological,  and  information  dissemination/translation  issues  at  play,  and  these  issues 
may  not  be  addressed  singly  through  technologies,  policies,  or  legislation.  Rather,  it  is 
critical  to  first  understand  the  existing  frameworks  in  which  cybersecurity  is  currently 
discussed. 

Current  Paradigms 

Though  no  domain  or  criminal  area  is  quite  like  cyberspace,  a  discussion  of 
cybersecurity  models  can  be  framed  by  existing  paradigms.  Perhaps  the  first 
macroeconomic  paradigm  that  comes  to  mind  is  that  of  physical-world  crime.  Criminal 
economics  has  been  studied  for  centuries,  and  many  models  exist,  from  narcotics  to  bank 
robbery  to  auto  theft.  If  focused  on  cybercrime,  these  models  may  yield  useful 
frameworks  and  bring  researchers  closer  to  understanding  the  mechanics  and  strategies  of 
the  cyber  cartels.  However,  the  existing  criminal  models  are  still  largely  rooted  in  the 
physical  world,  and  thus  may  not  account  for  the  distributed  and  instantaneous  nature  of 
cyberspace. 

Second  is  a  military  offense/defense  model.  Counting  “good  guys”  on  one  side 
and  “bad  guys”  on  the  other  may  help  us  define  who  is  currently  “winning.”  This  model. 
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too,  is  rooted  in  geography  and  may  not  correctly  weigh  the  impact  of  small-scale  issues, 
instead  focusing  on  high-profde  events  and  advanced  capabilities. 

A  third  paradigm  to  model  cybersecurity  is  that  of  public  health.  Medical  based 
models,  specifically  for  infectious  disease,  exhibit  the  speed  and  networked  environment 
of  cyberspace.  They  also  take  into  account  the  effect  of  mitigation  measures  on  a 
population,  including  possible  second-and  third  level  after  effects.  Additionally,  public 
health  model  demonstrates  the  interchangeable  nature  of  threats  and  mitigations.  That  is, 
a  given  mitigation  can  be  used  to  treat  or  deter  a  variety  of  threats.  Finally,  public  health 
also  shares  the  distributed  and  simultaneous  nature  of  cyber  defense — a  given  threat  can 
strike  a  number  of  victims  all  around  the  same  time,  requiring  rapid  response  and 
information  sharing. 

C.  PROBLEM  STATEMENT 

In  order  to  make  a  rational  argument  for  cybersecurity  investment,  organizations 
must  first  ask  whether  cyber  breaches  present  a  real  cost  to  the  organization.  Next,  they 
may  ask  if  insuring  against  those  losses  is  a  better  solution  versus  actually  trying  to 
prevent  them  in  the  first  place.  Finally,  if  they  do  believe  that  the  costs  are  worth 
preventing,  and  that  insurance  does  not  present  a  viable  option,  they  must  ask  which 
measures  to  take,  and  what  the  expected  benefits  of  those  measures  are.  In  order  to  make 
this  assessment  objectively,  firms  need  to  be  able  to  perform  a  cost-benefit  analysis  of 
cyber  security  investments  and  measures.  To  do  so,  they  must  have  an  understanding  of 
the  expected  benefits  of  cyber  mitigation  measures,  as  well  as  the  expected  cost  of  not 
applying  those  measures.  Without  this  data,  firms  may  not  be  able  to  empirically  justify 
proactive  cybersecurity  investments. 

However,  this  data  is  hard  to  find.  Even  if  available,  it  is  often  not  standardized. 
And  lastly,  it  usually  applies  to  a  more  macro  understanding  of  the  incident,  rather  than 
the  specific  tactics  used  by  both  offense  and  defense  during  an  intrusion.  While  these 
details  may  not  concern  senior  leadership,  they  do  form  the  foundation  of  understanding 
cyber  attacks  and  mitigations.  That  understanding  can  then  inform  strategic  models  and 
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decisions.  Without  this  understanding,  however,  leaders  are  left  to  a  theoretical 
expectation,  or  best  guess  estimate  for  their  investments. 

In  kinetic  warfare  and  physical  crime,  real-world  simulations  are  understandably 
tricky.  You  cannot  fully  use  real  weapons  in  a  wargame,  and  you  cannot  fire  real  bullets 
in  a  police  exercise.  In  cyber,  however,  you  can  -  and  you  can  do  it  over  and  over  again. 
For  example,  it  is  possible  to  build  an  environment  which  closely  resembles  a  business 
network,  complete  with  Windows  domains,  a  DMZ  network,  routers,  firewalls  and  other 
elements.  This  network  can  be  attacked,  infiltrated  and  destroyed  without  any  real  harm 
to  real-world  networks  or  operations.  Using  virtual  machines,  it  can  be  rebuilt  and  done 
all  over  again.  Cyber  exercises,  which  already  exist  in  many  forms,  thus  present  an 
untapped  resource  of  cyber  attack  and  defense  statistics;  we  simply  have  not  been 
collecting  them  properly.  If  efforts  can  be  made  to  capture  and  standardize  statistics 
during  cyber  exercises,  the  aggregate  dataset  could  be  used  to  build  and  test  and  model  a 
myriad  of  scenarios. 

D,  APPROACH 

This  thesis  project  involves  the  development  of  a  cyber  exercise  methodology  in 
which  a  single  user,  two  users,  or  classroom  could  create,  gather,  and  standardize  cyber 
attack  and  defense  statistics.  The  objective  is  to  develop  a  standardized  exercise 
framework  to  gather  statistics  in  a  technology-agnostic  manner.  The  exercise  will  be  used 
to  inform  and  test  the  statistics.  While  in  theory,  some  statistics  may  make  logical  sense, 
if  they  cannot  actually  be  collected  and  analyzed  from  an  exercise,  then  they  will  not  be 
usable  in  cyber  investment  calculations. 

E,  ASSUMPTIONS  AND  LIMITATIONS 

This  thesis  assumes  certain  understanding  of  cyber  security  terms  and  concepts,  as 
well  as  concepts  used  in  mathematics  and  statistics.  It  is  limited  mainly  by  budget,  hence 
the  use  of  mainly  open  source  and  unsupported  software  in  the  creation  of  the  attack  lab. 


5 


The  thesis  projeet  will  be  comprised  of  the  discussion  of  cyber  metrics  and 
models,  the  reasons  for  using  cyber  exercise  data  to  test  those  models,  and  finally  the 
creation  and  testing  of  an  experimental  cyber  attack  framework. 

While  knowledge  of  cyber  attack  and  defense  methodologies  is  informed  by  real 
operational  experience  from  US-CERT  cases,  all  the  data  in  this  thesis  is  unclassified  and 
fabricated  in  a  lab  environment.  The  tools  and  exploits  used  herein  are  seen  and  used  in 
real-world  environments. 

F.  THESIS  ORGANIZATION 

Chapter  II  provides  background  on  existing  cyber  models  and  metrics.  Chapter  III 
provides  the  criteria  for  qualifying  cyber  exercises  as  a  source  of  real  statistical  data. 
Chapter  IV  outlines  the  construction  of  the  exercise  environment,  execution  of  a  test 
exercise,  and  the  results  of  data  collection.  Chapter  V  summarizes  findings  and  proposes 
future  work. 
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II.  BACKGROUND— EXISTING  CYBER  MODELS 


A,  THE  COST  OF  FAILURE 

In  a  recently  published  Forrester  Research  Paper,  “Determine  The  Business  Value 
of  An  Effective  Security  Program  — Information  Security  Economics  101,”  Ed  Eerrara 
argues  that  “To  fully  understand  security’s  financial  impact  on  the  organization,  CISOs 
should  understand  ah  the  various  costs  of  protecting  information.  This  includes  the  fixed 
costs  as  well  as  variable  costs,  especially  those  related  to  breaches”  (Eerrara,  2002). 

1.  Stock  Market  Returns 

One  rather  obvious — and  elegant — method  to  measure  the  cost  of  a  cybersecurity 
breach  in  the  private  sector  is,  simply,  to  measure  the  stock  market  impact  of  breach 
announcements.  This  is  precisely  what  Katherine  Campbell  et  al.  studied  in  2003. 
(Katherine  Campbell,  2003)  In  the  study,  the  researchers  found  “an  overall  negative  stock 
market  reaction  to  public  announcements  of  information  security  breaches.”  This  is 
largely  unsurprising  news;  however,  Campbell  further  note,  “We  find  a  highly  significant 
negative  market  reaction  for  information  security  breaches  involving  unauthorized  access 
to  confidential  data,  but  no  significant  reaction  when  the  breach  does  not  involve 
confidential  information.”  Thus,  the  nature  of  the  breach,  in  the  eyes  of  investors,  does 
appear  to  affect  the  market  cost. 

An  associated  study  was  performed  by  Gordon,  Eoeb,  and  Zhou  in  2011 
(Eawrence  A.  Gordon,  2011).  The  authors  note  conflicting  studies  since  the  2003  study, 
saying,  “several  researchers  have  studied  the  impact  of  [cyber]  breaches  on  the  stock 
market  returns  of  firms.  The  results  from  these  studies,  however,  have  been  mixed.”  Thus, 
Gordon  et  al.  used  a  more  “sophisticated  market  model  over  a  long  period  and  over  two 
distinct  and  naturally  arising  sub-periods,”  and  in  this  study,  attempted  to  “resolve 
conflicting  evidence  from  previous  studies  concerning  the  effect  of  information  security 
breaches.”  (Eawrence  A.  Gordon,  2011).  Their  findings  supported  their  earlier  thoughts, 
that  “...news  about  information  security  breaches  (where  such  breaches  are  treated  as  a 
generic  category)  had  a  statistically  significant  effect  on  the  stock  returns  of  publicly 
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traded  firms.”  This  appears  to  suggest,  at  least,  that  eyber  breaches,  in  fact,  can  cause  a 
very  real  negative  financial  impact  to  firms. 

The  2011  stock-return  analysis  above  also  noted  an  interesting  phenomenon:  an 
apparent  decrease  in  the  market  effect  of  a  breach — namely,  that  breaches  before  9/11 
cost  more  to  the  stock  market  performance  of  a  firm  than  breaches  after  9/11  did.  Since 
the  breaches  themselves  had  little  other  differences,  human  perception  likely  has  much  to 
do  with  this.  Put  by  Gordon  et  ah,  “there  seems  to  a  shift  in  the  attitude  among  investors 
toward  viewing  information  security  breaches  as  creating  a  corporate  “nuisance”  (or 
merely  another  recurring  operating  cost)  rather  than  creating  a  potentially  serious 
economic  threat  to  the  survival  of  firms.”  If  cyber  breaches  are,  indeed,  viewed  as  an 
“operating  cost”  in  today’s  environment,  that  may  give  private  sector  leaders  some 
relief — after  all,  negative  events  can  be  covered  by  insuring  against  them. 

2.  Cyber  Insurance 

Despite  its  seemingly  logical  place  in  the  cybersecurity  field,  cyber  insurance  has 
so  far  failed  to  provide  a  mature,  widely  accepted  framework  for  insuring  against  cyber 
risk.  In  “Aegis,  A  Novel  Cyber  Insurance  Model,”  for  example.  Pal,  Golubchik  and 
Psounis  propose  a  new  quantitative  model  for  cyber  insurance,  but  also  concede  that  their 
model,  the  Aegis  framework,”also  faces  problems  [related  to]  interdependent  security.” 
(Pal,  Golubchik,  &  Psounis,  2011).  They  are  essentially  saying  that  the  fundamental 
interdependence  of  assets  in  cyberspace,  among  other  things,  currently  limits  the 
application  of  their  cyber  insurance  model.  That  is  not  to  say  that  the  problem  is 
intractable:  Rainer  Bohme  and  Gaurav  Kataria  attempt  to  model  correlation  in  “Models 
and  Measures  for  Correlation  in  Cyber-Insurance.”  In  this  study,  they  find  that  because  of 
the  high  degree  of  interdependence  present  in  cyberspace,  “our  simulation  results  indicate 
that  cyber-insurance  is  best  suited  for  classes  of  risk  with  high  internal  and  low  global 
correlation,”  that  is,  companies  or  assets  which  have  minimal  externalities.  Additionally, 
they  cite  a  lack  of  good  empirical  data  with  which  to  test  their  findings,  suggesting  that  as 
a  direction  for  future  work.  (Kataria,  2006) 
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In  their  2010  paper,  “Modeling  Cyber-Insuranee:  Towards  A  Unifying 
Framework”  (Bohme  &  Sehwartz,  2010),  Rainer,  Bohm,  and  Sehwartz  examine  various 
eyber-insuranee  models,  noting  the  relatively  disappointing  prospeets  in  the  eyber 
insuranee  market:  “Even  a  eonservative  foreeast  of  2002,  whieh  predieted  a  global 
market  for  eyber-insuranee  worth  $2.5  billion  in  2005,  turned  out  to  be  five  times  higher 
than  the  size  of  the  market  in  2008  (three  years  later).  Overall,  in  relative  terms,  the 
market  for  eyber-insuranee  shrank  as  the  Internet  eeonomy  grew.”  The  authors  note  also 
that  this  weak  eyber  insuranee  market  is  often  attributed  to  a  laek  of  understanding:  “The 
observable  under-development  of  the  market  for  eyber-insuranee  is  often  attributed  to 
insurers’  laek  of  experienee  with  a  new  kind  of  risk,  eombined  with  insuffieient  aetuarial 
data  hindering  eompetitive  prieing.”  Thus,  it  is  diffieult  for  insuranee  providers  to 
determine  what  insuranee  should  eost,  beeause  of  a  laek  of  objective  data  informing  the 
expected  loss  of  a  cyber  event.  This  “lack  of  actuarial  data,”  as  will  be  seen  later,  is  not 
just  missing  in  insurance  models — it  is  indeed  the  missing  piece  in  the  cost/benefit  model 
of  cybersecurity  investment. 

B,  MEASURING  BENEFITS 

1,  The  Gordon-Loeb  Model 

In  2002,  University  of  Maryland  Professors  Lawrence  Gordon  and  Martin  Loeb 
developed  a  popular  model  for  objectively  evaluating  investment  benefit.  The  model 
takes  an  inductive,  logical  approach  towards  arriving  at  a  final  assessment  of  investment 
benefit.  Essentially,  the  equation  which  Gordon  and  Loeb  developed  measures  the 
economic  benefit  of  information  security  as  the  difference  between  money  spent  on  the 
investment  itself,  and  the  marginal  decrease  in  expected  loss  due  to  the  protection 
provided  by  the  investment.  Measuring  the  cost  of  the  investment  is  relatively 
straightforward — it  is  simply  the  cost  of  whatever  people,  products,  and  technology  are 
part  of  the  cybersecurity  investment.  The  impact  of  the  investment,  on  the  other  hand, 
depends  on  the  computed  decrease  in  vulnerability.  This  is  harder  to  measure  inductively. 
Placing  a  baseline  number  on  a  system’s  objective  vulnerability  is  not  easy,  and  no 
common,  widely-used  scale  currently  exists.  However,  the  Gordon-Loeb  model  only  asks 
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for  the  difference  in  vulnerability,  before  and  after  the  security  investment  is  made.  That 
difference  then  informs  the  change  in  expected  loss,  assuming  a  static  economic  value 
assigned  to  the  vulnerable  asset.  As  put  in  their  paper,  “the  key  to  analyzing  information 
security  decisions  is  not  the  vulnerability  (or  the  expected  loss  without  the  investment), 
but  the  reduction  in  expected  loss  with  the  investment.”  Thus,  the  Gordon-Loeb  model 
elegantly  simplifies  the  economic  decision  to  one  bit  of  information;  the  decrease  in 
expected  loss.  This,  however,  is  where  the  model  fails;  it  plugs  in  arbitrary  vulnerability 
functions  without  any  real  explanation  as  to  why  these  relationships  exist.  In  order  to 
complete  this  model,  true  relationships  between  vulnerability  and  mitigation  must  be  built 
and  justified. 

2,  Empirical  Analysis 

Historical  empirical  analysis  has  been  tried.  Liu,  Tanaka,  and  Matsuura  present  “a 
two-step  empirical  analysis  of  investing  in  security  countermeasures  based  on  a  Japanese 
enterprise  survey  of  security  investments  in  the  private  sector  (Liu,  Tanaka,  &  Matsuura). 
The  first  step  measured  the  vulnerability  of  Japanese  enterprise  by  examining  the  change 
in  reported  computer  virus  incidents  reported  to  the  Japanese  government  between  2002 
and  2003,  and  then  examining  which  companies  on  both  lists  made  significant 
cybersecurity  investments.  The  study  was  made  possible  largely  due  to  the  mandatory 
reporting  requirements  of  the  Japanese  Ministry  of  Economy,  Trade,  and  Industry 
(METI),  and  survey  data  gathered  from  Japanese  enterprises.  The  study  found  that 
significant  investments  in  security  protections  (“countermeasures”)  did  indeed  lead  to  a 
decrease  in  computer  virus  incidents.  Persistent,  or  targeted  threats  may  not  have  been 
affected.  While  only  as  good  as  the  survey  information  provided,  this  study  does  provide 
an  empirical  test  of  the  Gordon  Eoeb  model  and  as  such  lays  the  groundwork  for  future 
study. 

Another,  more  mathematical  examination  of  the  Gordon-Eoeb  model,  performed 

by  Jan  Willemson  (Willemson,  2006),  frames  the  need  for  a  further  fleshing  out  of  the 

relationship  between  investment  and  vulnerability:  “In  [their  model],  Gordon  and  Eoeb 

considered  two  specific  function  families,  but  actually  there  is  no  reason  to  assume  that 

any  function  in  any  of  these  families  corresponds  to  any  real  vulnerability  decrease 
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scenario.  Thus,  the  main  direction  for  further  work  is  to  look  for  functions  reflecting 
changes  in  vulnerability  for  some  real  situations.”  The  “further  work”  to  whieh 
Willemson  is  referring  is  preeisely  the  work  necessary  to  test  cyber  investment  models, 
and  thus  inform  deeisions  at  the  boardroom  level. 

Sueh  a  test  of  the  Gordon-Loeb  model  would  require  an  experimental  framework, 
eapable  of  isolating  the  variables  inherent  in  a  cyberseeurity  event,  so  that  data  points  can 
be  gathered,  and  possible  eorrelations  could  be  drawn.  From  this  data,  one  may  be  able  to 
build  a  eontinuous  relationship  between  vulnerability  and  mitigation,  speeifie  to 
partieular  attaek  veetors.  This  relationship  would  fill  out  the  Gordon-Loeb  model,  and 
given  eompany-speeifie  assessments  of  asset  valuations,  a  more  informed  cost-benefit 
equation  eould  be  derived. 

The  investigation  into  the  reasons  behind  cyberseeurity  investment  thus  begins 
with  viewing  the  problem  as  the  aggregate  of  independent,  self-interested  deeisions  by 
firms.  From  there,  one  ean  assess  the  various  models  whieh  may  apply — in  this  ease,  the 
Gordon-Loeb  model.  Understanding  the  firm’s  deeision  proeess  first  requires  assessment 
of  the  real  eost  of  a  breaeh.  Additionally,  one  must  examine  non-investment  options  sueh 
as  insurance.  If  investment  is  a  ehoiee,  it  must  then  be  finaneially  justified.  Sueh  a 
justifieation  model  exists  in  Gordon-Loeb,  but  it  is  missing  a  eritical  element:  the 
relationship  between  vulnerability  and  eyber  investment.  However,  this  relationship  is  not 
eurrently  possible  to  objectively  ascertain  with  publiely  available  intrusion  information. 
Moreover,  it  is  highly  likely  that  there  are  many  sueh  relationships  speeifie  to  partieular 
vulnerabilities  and  mitigations — as  put  by  Willemson:  “elearly,  sueh  funetions  are 
strongly  applieation  area  speeifie.”  Therefore,  a  plausible,  and  produetive  way  forward  in 
modeling  private  seetor  eyber  investment  deeisions  is  to  arm  firms  with  objeetive, 
empirieal  data.  If  mathematieal  relationships  between  vulnerabilities  and  mitigations  ean 
be  derived,  that  would  further  fill  out  the  model.  This  is  a  eritieal  missing  piece  in 
understanding  the  fundamental  eeonomie  mechanies  of  the  cyber  eeosystem. 
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c. 


WHERE  IS  THE  DATA? 


Many  people  would  agree  that  eyber  security  is  a  data-rich  environment,  but 
much  of  the  real-world  data  resides  in  disparate  comers  of  the  private  sector,  academia, 
and  can  also  possibly  be  classified  by  government  agencies.  Additionally,  even  if  that 
data  can  be  shared  it  is  only  useful  in  statistically  significant  amounts,  and  when  it  is 
normalized,  formatted,  and  sensible. 

This  likely  has  much  to  do  with  the  legal  environment  surrounding  cyber 
incidents.  While  companies  have  become  increasingly  open  about  their  intmsions  in 
recent  years,  they  do  not  share  network  diagrams,  log  statistics,  or  mitigation  plans  with 
the  public,  and  for  good  reason.  The  details  of  a  corporate  breach  involve  proprietary  and 
personal  information,  and  as  such  remain  protected.  The  process  of  sanitizing  cyber 
attack  data  to  exclude  these  sensitive  details  is  very  labor  intensive,  and  organizations 
may  not  be  able  to  justify  dedicating  resources  to  that  effort,  especially  since  it  has  little 
to  do  with  addressing  the  problem  at  hand.  Moreover,  the  tools  and  technologies 
deployed  within  an  organization’s  network  often  belong  to  third  party  companies,  and 
exposing  the  failure  of  those  systems  may  similarly  cause  legal  and  contractual  issues  for 
companies.  Most  corporate  press  releases  are  limited  to  general  dates  and  figures 
regarding  the  breach,  as  well  as  customer-designed  guidance  on  protection  from  identity 
theft  or  fraudulent  activities. 

This  state  of  affairs  results  in  droves  of  anecdotes,  third  party  stories,  and  mmors, 
but  very  little  in  the  way  of  useful  statistics.  Not  to  mention,  in  many  cases  it  is  safe  to 
assume  that  the  victim  company  may  still  not  fully  understand  the  full  nature  of  the 
breach,  nor  the  preliminary  measures  taken  by  attackers.  Therefore,  even  in  the  most 
verbose  incident  reporting,  the  data  is  still  limited  by  the  visibility  and  understanding  of 
the  reporting  entity. 

1.  Existing  Metrics 

Several  cyber  metrics  systems  exist  for  recording  real-world  intrusion  events — 
such  as  Verizon’s  VERIS  framework  (Verizon,  2014),  the  SANS  Security  Metrics  (Payne, 
2006),or  the  NIST  standards  for  Incident  Reporting  (Cichonski,  Millar,  Grance,  & 
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Scarfone,  2012),  but  very  few  standard  metries  have  yet  been  widely  adopted  to  record 
and  compare  cyber  actions  taken  during  events.  Therefore,  it  is  nearly  impossible  for 
researchers  at  separate  organizations  to  analyze  the  same  data  set  from  two  different 
angles.  With  the  exception  of  open-source  projects  like  Project  Honeypot  (UnSpam 
Technologies,  2014),  the  HoneyNet  Project  (Project,  2014),  and  others,  publicly  available 
standardized  cyber  attack  data  is  surprisingly  scarce  and  rarely  interoperable. 

2.  Exercises  Hold  Untapped  Data 

Cyber  Exercises,  especially  those  of  a  Red/Blue  Team  nature,  have  the  potential 
to  give  researchers  a  “bird’s  eye  view”  into  a  cyber  attack.  In  almost  all  Red/  Blue  team 
competitions,  there  is  a  White  Team,  or  control  team,  which  can  change  the  situation  on 
the  fly  to  serve  the  objectives  of  the  exercise.  This  team  operates  with  omniscience  and 
full  awareness  of  the  actions  of  both  blue  and  red  teams.  If  the  data  available  to  the  white 
team  could  be  recorded  and  indexed,  it  would  provide  the  otherwise  elusive  visibility 
which  does  not  exist  in  real-world  intrusions,  a  perspective  not  subject  to  the  biases  or 
limits  of  either  defender  or  attacker.  The  data  would  simply  tell  the  story. 

Additionally,  the  frequency  and  diversity  of  cyber  exercises  in  recent  years  could 
potentially  cover  a  wide  variety  of  attack  methods,  mitigations,  and  technologies.  If  taken 
in  aggregate  over  several  years,  just  the  range  and  frequency  of  cyber  exercises  could 
cover  the  most  common  attack  methodologies.  Since  the  attack  and  defense  types  are  as 
diverse  as  the  systems  and  configurations  which  comprise  the  cyber  security  ecosystem, 
it  is  also  essential  that  simple,  standard,  technology-agnostic  metrics  be  built  which  can 
apply  to  all  cyber  exercise  situations.  Similar  to  a  batting  average  in  baseball,  these 
metrics  must  be  universally  usable  and  comparable  across  all  exercises. 

In  order  to  yield  useful  information  for  real-world  decisions,  it  is  imperative  that 
the  exercise  simulates  a  real-world  situation  as  closely  as  possible,  and  that  the  statistics 
gathered  are  standardized  and  repeatable.  The  following  section  will  examine  cyber 
exercises  and  cyber  attack  methodologies,  and  propose  criteria  for  the  design  of  the 
proof-of-concept  exercise. 
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III.  EXERCISE  METHODOLOGY  AND  DESIGN 
CONSIDERATIONS 

A,  CYBER  ATTACK  METHODOLOGY 

In  the  “Hacking  Exposed”  series  of  publications  written  by  Stuart  McClure, 
George  Kurtz,  and  Joel  Scambray,  the  “Anatomy  of  a  Hack”  chart  appears  on  the  inside 
of  the  back  cover.  Hacking  Exposed  l'^  edition,  for  example,  has  the  following  steps, 
shown  in  Table  1. 


1 

Eootprinting 

2 

Scanning 

3 

Enumeration 

4 

Gaining  Access 

5 

Pilfering 

6 

Covering  Tracks 

7 

Creating  Back  Doors 

8 

Denial  of  Service 

Table  1.  Hacking  Exposed  “Anatomy  of  a  Hack”  (from  Scambray,  McClure,  & 

Kurtz,  2012) 

While  various  iterations  of  this  methodology  have  developed  since  Hacking 
Exposed’s  Eirst  Edition,  the  “Anatomy  of  a  Hack”  provides  a  useful  framework  for 
discussing  the  phases  of  any  cyber  intrusion.  More  importantly  for  the  purposes  of  this 
discussion,  these  modules  can  be  used  to  group  the  myriad  tactics  and  techniques  within 
the  cyber  exercise  universe.  However,  Hacking  Exposed  was  written  before  much  of  the 
industry  began  focusing  on  the  “Advanced  Persistent  Threat,”  or  “APT”  style  of  attack, 
which  often  incorporates  client-side  exploitation  and  lateral  movement  (Symantec,  2014). 
Operations  like  these  have  been  described  in  a  collection  of  cyber  research  papers,  from 
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the  Mandiant  APTl  report  published  in  February  2013  (Mandiant,  2013),  to  the 
Elderwood  Project  paper  published  by  Symantec  in  2012  (O'Gorman  &  McDonald,  2012), 
back  to  the  Shady  RAT  and  Night  Dragon  reports  published  by  McAfee  in  201 1  (McAfee, 
2011)  (Alperovitch,  2011).  These  types  of  intrusions  center  around  movement  and 
persistence  within  a  compromised  network,  and  eventual  exfiltration  of  sensitive  data. 
These  operations  can  be  mostly  described  using  elements  of  the  Hacking  Exposed 
Anatomy. 

To  further  supplement  this  categorization,  however,  the  following  life  cycle 
included  in  the  Mandiant  APTl  report  provides  further  modules  and  the  idea  of  an 
internal  loop  in  the  methodology,  shown  in  Table  2: 


1 

Initial  Recon 

2 

Initial  Compromise 

3 

Establish  Eoothold 

Escalate  Privileges 

4 

Internal  Recon 

Move  Eaterally 

Maintain  Presence 

5 

Complete  Mission 

Table  2.  Mandiant  APTl  Methodology 

Reportedly,  many  operations  have  targeted  companies  and  organizations  in  the 
U.S.  housing  intellectual  property  central  to  their  business  model.  Therefore,  to  provide 
data  relevant  to  the  problem  noted  in  Chapter  I,  the  design  of  a  simulation  exercise  should 
follow  the  basic  framework  of  these  real-world  APT  intrusions.  The  next  section  briefly 
describes  general  categories  of  cyber  exercises,  and  how  to  tailor  an  exercise  more 
closely  to  an  APT-style  attack. 
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B,  CYBER  EXERCISE  FRAMEWORKS 

Cyber  competitions  have  matured  and  diversified  greatly  in  the  last  two  decades. 
(Eller,  2004)  Currently,  most  cyber  exercises  can  be  placed  into  four  major  categories, 
shown  below: 

Current  Models  of  Cyber  Exercises 

•  Disaster  Response/  Coordination 

•  Capture  the  Flag-  Race 

•  Capture  the  Flag  -  Battle  Royale 

•  Vulnerability  Discovery  vs.  Remediation 

•  Red  vs.  Blue 

a.  Disaster  Response  and  Coordination 

These  cyber  “exercises”  most  closely  resemble  physical  disaster  response 
exercises,  and  are  often  performed  on  a  regional  or  national  scale.  The  Cyber  Storm 
series  of  cyber  exercises,  for  example,  follow  this  model  (Department  of  Homeland 
Security,  n.d.) 

b.  Capture  the  Flag-Race 

These  are  Capture-The  Flag  competitions  in  which  teams  are  given  points  based 
off  how  many  “flags”  they  capture,  on  a  neutral  system.  The  event  is  timed  and  there  is 
minimal  inter-team  activity.  All  teams  are  essentially  attacking  the  same  system,  and  the 
fastest  team  to  capture  all  the  flags  wins.  Points  can  also  be  deducted  for  hints  or  help, 
affecting  the  aggregate  score  (Symantec,  2014;  The  National  Cyber  Feague,  2013). 
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c.  Capture  the  Flag — Battle  Royale 

In  this  scenario,  teams  try  to  eapture  flags  which  reside  on  other  teams’  networks, 
while  defending  their  own.  This  incorporates  both  attaek  and  defense,  usually  in  a  live- 
fire  exercise.  Figure  1  is  an  example  of  the  “Defcon”-style  exercise: 


Score’bot 

Polls  player  nodes, 

Looking  for  req.  services 


If  all  services  found, 
Score  one  point  for  the 
Flag  currently  on  that 


Player  Nodes 


. . .  while  each  team 
tries  to  replace  others’  flags 


Figure  1 .  Defeon-Style  Capture-the-Flag,  (from  Cowan,  Arnold,  Beattie,  Wright,  & 

Viega,  2003) 


Another  example  of  a  Battle  Royale-style  exercise  is  the  now-defunct  Cipher  CTF, 
which  employed  symmetric  competition  as  well  as  functionality  requirements.  A 
description  of  the  exercise,  taken  from  the  Cipher  CTF  website,  is  below: 

The  exereise  consists  of  multiple  teams,  each  hosting  a  server  that  has 
multiple  serviees  running,  like  e.g.,  a  Webserver,  a  mail  server,  or 
eustomized  services.  The  services  contain  typical  security  vulnerabilities 
that  allow  to  eompromise  the  server  to  a  certain  exten[t].  The  goal  is  to 
maintain  the  services  up,  functional  and  uncompromised  for  the  duration 
of  the  game.  Additional  seores  can  be  gained  by  patching  the 
vulnerabilities  of  the  services  and  exploiting  the  knowledge  of  the  found 
weaknesses  at  the  other  team’s  servers.  (Pimendis,  2010) 

It  should  be  noted  above  that  the  Cipher  CTF  incorporates  a  functionality 
requirement,  ensuring  that  network  defenders  do  not  lock  down  the  network  to  the  point 
of  making  it  unusable.  This  is  an  important  aspect  of  any  real-world  simulation,  beeause 
it  replicates — even  if  only  to  small  extent — the  pressures  that  network  defenders  deal 
with  in  seeuring  their  networks. 
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d.  Vulnerability  Discovery  vs.  Remediation 

In  some  cyber  defense  competitions,  points  are  awarded  to  teams  that  can  identify 
and  remediate  vulnerabilities  first,  and  provide  a  proof  that  the  vulnerability  is  fixed. 
While  this  is  an  important  aspect  of  blue-team  system  and  network  hardening  prior  to  an 
incident,  it  reveals  little  about  incident  response  capability  (Air  Force  Association,  2013) 
(DARPA,  2014). 

e.  Red  vs.  Blue 

In  this  model,  one  red  team  is  tasked  with  attacking  one  blue  team  network.  The 
goal  is  often  the  access  to  or  exfiltration  of  sensitive  data.  Points  are  often  awarded  to  the 
Red  team  for  certain  “flags,”  or  tasks  they  have  to  accomplish.  While  some  “hacking 
back”  may  be  seen  from  the  blue  team,  the  blue  team’s  main  objective  is  simply  to  hold 
off  the  red  team  for  as  long  as  possible  until  the  exercise  ends. 

Therefore,  the  exercise  type  that  most  closely  simulates  APT  activity  against  the 
U.S.  private  sector  and  civilian  government  is  the  Red/Blue  design,  for  several  reasons. 

Active  Threat — Most  Red/  Blue  team  exercises  involve  human  beings  on  both 
sides — hence  a  continual  active  threat,  and  active,  aware  defense.  This  most  closely 
resembles  the  conditions  of  APT  intrusions,  where  malicious  actions  are  thought  to  be 
mostly  done  manually. 

Functionality  Requirements — In  a  Red/  Blue  team  exercise,  the  Blue  Team  is 
defending  a  simulated  “real  network.”  This  often  involves  networks  with  an  expected 
functionality,  not  simply  a  test  range.  Thus,  various  services  and  protocols  must  be  open, 
for  business  to  occur.  This  also  closely  resembles  many  private  and  public  sector 
networks,  where  functionality  often  trumps  security. 

Exfiltration  Goal — Finally,  Red  vs.  Blue  Team  exercises  often  involve  a 
Confidentiality  or  Integrity  compromise — the  type  of  compromise  about  which  many 
companies  are  concerned  when  it  comes  to  APT  actors.  That  goal  determines  much  of  the 
pace  and  mitigations  steps  of  a  blue  team,  versus,  say  destructive  malware  or  DDoS. 
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C.  CONDITIONS  OF  THE  PROPOSED  EXERCISE  FRAMEWORK 

The  “Storyline”  of  the  proposed  exereise  framework  follows  a  blend  of  the 
Hacking  Exposed  and  Mandiant  APTl  methodologies,  with  discrete  steps  toward 
achievement  of  exfiltration  of  sensitive  data.  Additionally,  the  exercise  should  be 
constructed  to  allow  for  the  Red  Team  to  succeed  initial  compromise,  with  minimal,  but 
necessary,  assistance  by  the  White  cell.  This  helps  the  exercise  to  test  cyber  defense 
capabilities,  and  not  solely  pre-configuration  or  defensive  hardening  abilities  on  the  part 
of  the  blue  team. 

a.  Baseline  User  Activity/  Traffic 

Real-World  networks  are  dominated  by  “civilians” — meaning  it  is  not  just  red  and 
blue  team  members.  Much  of  cyber  defense  is  finding  malicious  activity  within  large 
amounts  of  “legitimate”  traffic.  Having  White  Team  members  pretend  to  be  legitimate 
“users”  on  the  network  is  one  way  to  accomplish  this.  There  are  several  open-source 
network  traffic  simulators  available,  and  these  could  be  used  to  create  the  signal/noise 
ratio  in  the  exercise  environment. 

b.  Intelligence/  Threat  Information 

Prior  to  any  attack,  real-world  cyber  defense  teams  may  likely  have  some 
understanding  of  the  threats  they  face — actors  and  attack  vectors.  Beginning  any  exercise 
with  some  basic  threat  intelligence  would  help  simulate  real-world  scenarios  and  give  the 
blue  team  a  way  to  prioritize  resource  allocation. 

c.  Blue  Team  Resources/  Functionality  Requirements 

During  an  attack,  some  have  argued  that  the  blue  team  should  just  unplug  the 
network.  In  real-world  scenarios,  this  is  rarely  an  option.  Blue  teams  have  to  keep  the 
network  to  an  acceptable  level  of  functionality,  or  at  least  pay  a  penalty  for  degraded 
performance  for  legitimate  users.  This  was  incorporated,  for  example,  into  the  above- 
mentioned  Cipher  CTF  exercise.  Additionally,  the  blue  team  cannot  have  unlimited 
monitoring,  storage,  or  analysis  resources.  These  constraints  will  then  help  emulate  the 
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real-world  decisions  that  cyber  defenders  face — decisions  which  are  the  very  foundation 
of  effective  defense  and  incident  response. 

d.  Preexisting  Configurations 

Few  cyber  defense  teams  get  to  build  their  systems  from  the  ground  up.  If 
possible,  blue  teams  should  defend  systems  from  an  existing  state  while  keeping  core 
functionalities  running.  This  best  simulates  real-world  scenarios  where  security  analysts 
and  leaders  inherit  the  existing  vulnerabilities  on  their  network,  and  work  to  fix  those  as 
best  they  can.  Even  more  so  for  cyber-incident  response  teams,  who  may  know  very  little 
about  the  customer  network  before  asked  to  come  on  site. 

e.  Real-World  Duration 

Cyber  defenders  are  already  busy  enough.  It  is  hard  to  find  time  to  do  training, 
and  thus  many  exercises  are  constrained  to  within  a  week  or  two.  Given  this  real-world 
time  constraint,  it  can  be  difficult  to  simulate  an  APT  campaign,  or  even  a  criminal  cyber 
attack,  which  may  in  reality  happen  over  months — or  years  in  some  APT  cases. 

f.  Scope 

Depending  on  the  exercise  architect,  the  contested  network  must  be  scoped.  While 
large-scale  intrusions  may  take  place  on  networks  with  thousands  of  machines,  exercises 
typically  operate  at  several  orders  of  magnitude  lower.  To  replicate  the  internal  network 
of  even  a  medium-sized  corporation  would  require  hundreds  of  machines,  licenses,  and 
infrastructure  devices,  which  most  classroom  environments  may  not  have.  Instead, 
exercises  should  seek  to  use  representations;  a  handful  of  desktop/laptop  machines,  for 
example,  can  effectively  represent  a  1000+  machine  user  environment. 

g.  Provide  Avenue  for  Initial  Compromise 

With  an  advanced  and  persistent  threat,  eventual  initial  compromise  of  some  asset 
is  all  but  guaranteed.  Sooner  or  later,  a  vulnerable  system  (known  or  unknown)  will  be 
exploited.  A  user  will  click  on  a  link  or  open  an  attachment.  A  USB  drive  will  be  plugged 
into  a  system.  A  vulnerable  web  server  will  be  promoted  into  production,  or  a  zero-day 
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exploit  will  bypass  even  the  best  seeurity  team.  Foeusing  on  prevention  is  futile — ^what  is 
important  is  early  deteetion,  resilienee,  and  eradieation  before  sensitive  data  ean  be  stolen. 


h.  Provide  Avenues  to  Move  Laterally/ Establish  Persistence 

Onee  inside  the  network,  APT  aetors  will  often  utilize  preexisting  infrastrueture  to 
move  laterally  -using  legitimate  aeeounts,  file  shares,  ete.  These  methods  are  preeisely 
the  same  tools  and  teehnologies  that  legitimate  users  utilize,  and  as  sueh  they  must 
remain  enabled  for  the  exereise.  This  speaks  to  the  blue  team’s  baseline  funetionality 
requirements;  they  eannot  simply  shut  down  the  basie  funetions  of  the  network,  but  rather 
must  find  the  anomalous  behavior  in  logs  and  artifaets. 

L  Timing 

To  allow  for  full  aetivities  on  both  sides,  but  also  a  full  ability  to  log  and  reeord 
results,  an  exereise  eould  eontain  three  phases,  eaeh  entailing  several  steps.  An  example 
phase  breakout  is  below  in  Table  3,  using  the  various  modules  from  the  Haeking  Exposed 
Anatomy  and  the  APTl  methodology: 


Phase  1 

Scanning 

Enumeration 

Phase  2 

Establish  Foothold 

Escalate  Privileges 

Maintain 

Presence 

Move  Laterally 

Phase  3 

Complete  Mission 

Cover  Tracks 

Tables.  Phases  of  Offense 


If  eonditions  sueh  as  those  noted  above  ean  be  ereated  in  a  elassroom 
environment,  then  the  aetions  and  reaetions  of  an  exereise  in  that  environment  ean 
provide  a  reasonable  simulation  for  aetual  cyber  incidents.  The  following  chapter  will 
describe  the  topological,  technical,  and  temporal  aspects  of  a,  localized,  easy  to  replicate 
“One  Man”  exercise,  as  well  as  a  classroom-setting  Red  vs.  Blue  Exercise,  both  of  which 
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should  resemble  real-world  intrusions.  These  will  not  only  provide  models  for  edueating 
eyber  defenders,  but  also  for  gathering  statisties  for  use  in  eyber  modeling  efforts.  These 
statistics  will  need  to  be  standardized,  repeatable,  and  granular  enough  to  allow  for 
multiple  combinations  and  derivative  statistics  in  future  studies. 

D,  STATISTICAL  MEASURES 

1.  Indicators  and  Infrastructure 

The  statistics  are  intended  to  measure  how  effective  the  Blue  Team’s  detection 
efforts  were  against  the  Red  Team’s  infrastructure.  These  can  be  broken  down  by  attack 
phase  and  totaled  at  the  end.  Table  4  breaks  these  team-based  measures  and  statistics 
down. 


Measures 

Data  Type 

Infrastructure  Nodes  Used  by  Red  Team 

Count 

Unique  Network  Infrastructure  Nodes  Enumerated  by  Blue  Team 

Count 

False  Positive  Network  Indicators 

Count 

Artifacts  Placed  on  Blue  Network  by  Red  Team 

Count 

Artifacts  Detected  by  Blue  Team 

Count 

False  Positive  Artifacts  Found  by  Blue  Team 

Count 

Accounts  Available  to  Red  Team 

Count 

Accounts  Used  by  Red  Team 

Count 

Compromised  Accounts  Discovered  by  Blue  Team 

Count 

Statistics 

Data  Type 

Red  Team  Infrastructure  /Blue  Team  Indicators 

Ratio 

Network  Nodes  Signal/Noise 

Ratio 

Artifacts  Used  Versus  Detected 

Ratio 

Artifact  Signal/Noise 

Ratio 

Accounts  Used  Versus  Discovered 

Ratio 

Account  Signal/Noise 

Ratio 

"Legitimate"  Tools  Used  by  Red  Team 

Number 

Table  4  Team-Based  Measures  and  Statistics 
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2. 


Tool-Based  Statistics 


These  statistics  are  meant  for  network  defenders  to  evaluate  which  tools  helped 
the  most  during  a  given  exercise.  Over  the  course  of  many  exercises,  these  aggregate 
statistics  will  show  which  tools  are  usually  among  the  most  effective  for  network 
defenders.  These  stats  can  also  be  broken  down  by  phase  and  tabulated  at  the  end. 


Measures 

Data 

Type 

Number  of  True  Positives 

Count 

Unique  True  Indicators 

Count 

Number  of  False  Positives 

Count 

Number  of  Instances  of  Tool 

Count 

Red  Team  Blocks 

Count 

Statistics 

Data 

Type 

Signal/Noise  Ratio 

Ratio 

True  Positives  Per  Instance 

Ratio 

Red  Team  "Blocks" 

Number 

%  of  All  True  Indicators 

Ratio 

%  of  All  True  Pre-Exfiltration 

Indicators 

Ratio 

Table  5.  Tool-Based  Measures  and  Statistics 


E,  INCORPORATING  STATISTICS  INTO  CYBER  INVESTMENT  MODEL 

Hypothetically,  for  every  APT  exfiltration  attack,  there  is  a  probability,  at  any 
given  time,  that  it  will  either  succeed  or  not.  The  job  of  the  security  administrator  is  to 
minimize  the  probability  of  a  successful  exfiltration.  To  do  so,  the  security  team  uses 
detections  and  mitigations. 

1,  Evaluating  Alerting  Technologies  -  Theory 

Whether  at  the  network  or  host  level,  cyber  defense  technologies  exist  to  detect, 
alert,  and  block  badness  from  occurring.  Ideally,  a  cyber  defense  technology  would  detect 
all  malicious  indicators  within  its  purview,  and  only  the  malicious  ones  (zero  false 
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positives).  Additionally,  the  teehnology  would  not  miss  any  malieious  activities  (zero 
false  negatives).  In  order  to  circumvent  this,  attackers  seek  to  bypass,  obfuscate,  or 
change  their  infrastructure. 

Rather  than  trying  to  build  an  objective  measure  of  benefit  from  the  ground  up, 
this  approach  allows  an  aggregate  of  attackers  to  empirically  determine  which 
technologies  and  procedures  have  the  greatest  impact  to  their  operations.  The  benefit  of  a 
security  asset  is  then  by  definition  its  negative  effect  on  offensive  operations. 

For  an  attacker  to  make  steps  forward  in  their  operation,  they  must  use 
infrastructure.  Infrastructure  has  a  chance  of  setting  off  alerts,  which  can  provide 
indicators  to  the  blue  team.  The  blue  team  then  can  remove  some  percentage  of  the 
necessary  infrastructure  from  the  red  team.  In  order  to  fully  prevent  the  red  team  from 
moving  forward  (a  “block”)  the  blue  team  must  detect  and  mitigate  all  infrastructure  in 
that  step,  assuming  perfect  detection  (no  false  positives)  and  full  infrastructure 
enumeration  (all  infrastructure  identified). 

Theoretically,  an  attacker  has  a  0%  chance  of  achieving  his  or  her  goal  if  all  their 
offensive  infrastructure  is  discovered.  This  includes  external  IP  address  space  they  may 
use  as  a  jump  point,  and  malware  and  exploits  at  their  disposal,  but  also  the  “internal 
infrastructure”  gained  after  initial  compromise.  This  includes  stolen  account  credentials 
and  internal  points  of  presence,  for  example.  As  a  cyber  defender,  it  is  crucial  to  find  and 
eliminate  this  infrastructure,  or  at  least  take  it  away  from  the  attacker.  Thus,  the  share  of 
prevention  belonging  to  any  one  technology  can  be  reasonably  measured  by  the 
percentage  of  actor  infrastructure  that  a  given  technology  is  responsible  for  finding.  As 
will  be  shown  later,  this  approach  can  also  be  tailored  to  penalize  for  false  positives  and 
false  negatives.  Measuring  the  effectiveness  of  a  technology  is,  in  mathematical  terms,  to 
measure  the  intersection  between  Set  A  (actor  threat  infrastructure)  and  Set  B  (all 
technology-discovered  indicators).  When  applied  to  a  controlled  exercise,  the  expected 
benefit  of  a  technology  can  be  viewed  as  the  sum  of  all  infrastructure  enumerated  during 
the  exercise.  This  is  the  net  impact  that  this  technology  could  have  potentially  had  on  the 
adversary,  assuming  a  perfect  security  team.  The  next  section  breaks  down  the  terms  and 
calculation  of  this  metric. 
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2,  Properties  of  Indicators 

For  each  indicator  “I”  provided  by  a  given  technology,  there  are  the  following 
properties: 

1 .  Appearances  of  “I”  in  total  number  of  records  generated  by  technology,  and; 

2.  Percentage  of  similar  infrastructure  available  to  adversary. 

For  an  example  of  the  first,  take  an  IP  address  detected  by  an  IDS  during  a  scan. 
The  IP  address  appears  in  5%  of  the  alerts.  While  the  technology  obviously  believes 
something  is  awry,  most  alerts  require  validation  and  examination  for  false  positives.  In 
order  to  evaluate  the  technology  independently  of  human  beings  or  skillset,  this  metric 
assumes  a  random  choice  of  any  alert  as  an  indicator.  In  this  case,  there  is  a  5%  chance 
that  this  indicator  would  be  randomly  chosen  out  of  all  alerts. 

With  regards  to  the  second,  it  is  also  important  to  properly  weight  the  relative 
importance  of  a  given  indicator.  For  example,  if  the  network  administrator  blocks  an  IP 
address,  but  the  actor  has  10  similar  IPs  available  from  which  they  can  scan,  the 
administrator  has  only  eliminated  10%  of  the  problem.  Thus,  the  expected  benefit  of  this 
indicator  is  the  chance  of  discovering  it  in  logs  times  the  marginal  impact  it  has  on  the 
active  adversary  infrastructure.  As  another  example,  if  a  malicious  file  is  found  on  a  host 
machine,  but  there  are  10  such  infections,  the  administrator  has  only  removed  10%  of  the 
problem. 

3.  Implementation  Into  a  Simplified  Gordon-Loeb  Model 

Remember  that  the  Gordon-Loeb  model  makes  the  reasonable  assertion  that 
Expected  Benefit  of  Information  Security  (EBIS)  Investment  is  equal  to  the  expected 
change  in  loss  due  to  a  breach.  Assuming  an  APT-style  data  exfiltration  scenario,  the  cost 
of  a  data  breach  can  be  estimated  by  the  intellectual  property  valuation  of  the  data  that 
stands  to  be  stolen.  That  is: 


EBIS  =  P(s)*Worth  of  Asset 

Equation  1 .  Simplified  Gordon-Loeb 


26 


This  P(S)  can  be  substituted  by  a  metrie  ealled  “Detection  Impact.”  For  a  given 
technology,  Detection  Impact  is  measured  as  the  signal  to  noise  ratio  of  the  teehnology 
combined  with  the  impact  of  the  indicators  the  teehnology  diseovers. 

What  this  looks  like,  mathematically,  is: 

4.  Detection  Impact 

For  all  Indicators  Ii,  h,  I3,  etc  generated  by  a  given  Teehnology  T,  the  teehnology 
has  an  aggregate  impact  on  the  adversary’s  ability  to  eomplete  their  mission.  If  there  are 
three  technologies  TA,  TB,  and  TC,  each  responsible  for  discrete,  mutually  exclusive 
indicators: 

Ta:  Iai,  Ia2,  Ia3 
Tb:  Ibi,  Ib2,  Ib3,Ib4 
Tc:  Ici,  Ic2 


Then  the  Detection  Impact  of  Technology  A,  represented  by  D(Ta)  is  equal  to  the  below: 
I^(Ta)  =  '^{Signal  to  Noise  Ratio  of  Indicator  Ijijf)  x  {Indicator  Weight  Factor) 

n 

Equation  2.  Deteetion  Impact 

Signal  to  Noise  Ratio  for  Eaeh  Indieator”  is  caleulated  as: 

True  Positive  Alerts  Generated  Conataining  Indicator  I 
All  Alerts  Generated  by  Technology  T 

Equation  3.  Indicator  Signal  To  Noise  Ratio 

The  Weighting  Eactor  represents  the  share  of  like  actor  infrastructure  that  an 
indicator  represents.  Eor  example,  if  the  attackers  have  3  exploits  available,  and  the 
teehnology  finds  1,  that  weighting  factor  is  1/3. 
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(Number  of  Like  Infrastructure  at  Attacker's  Disposal) 


Equation  4  Indicator  Weight  Factor 


(Alerts  Generated  containing  Indicator  ^  r  1  -i 

(Total  Number  of  Alerts  Generated  By  Tool)]  ^Number  of  Like  Infrastructure  at  Attacker's  Disposal^ 


Equation  5  Breakout  of  Detection  Impact 

As  an  example  of  how  the  summation  works,  suppose  a  technology  detects  three 
indicators.  The  calculation  of  the  Detection  Impact  is  in  Table  6. 


True  Positive 
Total  Alerts 
Signal/Noise  Ratio 


Indicator  1 

Indicator  2 

Indicator  3 

Total 

20 

15 

17 

52 

100 

100 

100 

100 

0.20 

0.15 

0.17 

0.52 

Indicator 

Total  Infrastructure 
Weighting  Factor 


1 

25 

0.0039 


1 

25 

0.0039 


0.0039  0.0039 


Technology  Detection 
Share 

Table  6. 


0.1%  0.2% 


Calculation  of  Detection  Impact 


Table  7  is  an  example  Weighting  Factor  Calculation,  given  an  actor  with  a  /24 
subnet,  20  owned  internal  machines,  and  2  accounts  enumerated. 
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Weight  of  Indicator 

Available 

Weighting  Factor 

IPs  Machines  Accounts 

256  20  2 

0.004  0.025  0.500 

Table  7.  Calculation  of  Weighting  Factor 


5,  Summary 

The  Detection  Impact  metric  rewards  high  true  positives  and  penalizes  false 
negatives,  allowing  defenders  to  compare  technologies  across  various  levels  of  the 
security  stack.  For  example,  an  IDS  detecting  50%  true  positives,  but  which  enumerates 
20%  of  available  actor  IP  space,  has  a  50%  X  20%  =  10%  impact  on  the  actor’s  ability  to 
complete  their  mission.  Conversely,  a  host-based  AV  with  90%  true  positive  rate,  but 
which  misses  1  of  10  malicious  artifacts  has  a  90%  X  10%  =  9%  impact  on  the  actor’s 
operation.  This  metric  can  also  inform  cyber  defenders  where  they  can  optimize  their 
impact  on  the  attackers — for  example,  using  IP  lists  to  block  APT  activity,  or  playing 
“whack-a-mole”  with  an  IP  block  list,  will  not  have  much  impact.  However,  hardening 
hosts  and  updating  defensive  signatures  could  potentially  have  a  significant  effect  on  an 
actor  using  commodity  malware,  by  preventing  or  alerting  on  100%  of  infected  systems. 
This  would  present  a  high  yield  mitigation  step  in  the  cyber  kill  chain.  Thus,  in  addition 
to  evaluating  the  tool’s  detection  abilities,  this  metric  also  rewards  tools  that  can  produce 
high-value  indicators. 
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IV.  METHODOLOGY  AND  DATA  GATHERING  PROOF  OF 

CONCEPT 

A,  FRAMEWORK  FOR  ONE  OR  TWO-PLAYER  EXERCISE 

Overview: 

In  order  to  demonstrate  how  these  metrics  are  measured,  I  created  a  small  battle 
environment  to  conduct  an  exercise.  The  steps  in  this  exercise  fall  into  common 
methodology  steps,  and  the  network  under  attack  retained  its  functionality  throughout  the 
exercise,  hence  satisfying  the  requirements  listed  in  the  previous  chapter.  This  exercise 
modeled  the  exfiltration  of  data  from  an  internal  Windows  environment.  It  incorporates 
scanning,  exploitation — both  outside  in  and  client  side — password  cracking,  lateral 
movement,  and  exfiltration. 

For  the  purposes  of  this  exercise,  each  machine  involved  represents  a  different 
“segment”  of  an  enterprise  network: 

Host/  User  Machine:  Internal  network 

Web  Server:  DMZ 

Windows  2003  Server:  Enterprise  Infrastructure 

Kali  Linux:  Evil  Internet 

Security  Onion:  Security  Infrastructure  (Security  Onion,  2014)1 

Assumptions: 

Eootprinting:  Because  this  was  a  single-user  exercise,  executed  by  myself,  I 
bypassed  the  “Eootprinting”  stage.  Since  I  built  the  entire  exercise,  it  was  impossible  to 
not  know  the  configurations  or  details  of  the  blue  network.  A  proper  Eootprinting  step  is 
outlined  in  the  classroom  exercise  description. 
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Flat  Network: 


While  it  is  possible  to  include  a  virtual  router  and/or  firewall  into  the  single  user 
exercise,  this  exercise  assumes  a  “fiat”  network.  Because  each  machine  in  this  exercise  is 
representative  of  an  enterprise  segment,  there  was  no  functional  need  for  a  router,  as 
shown  in  Figure  2. 


Kali  Linux 


Security  Onion 


Metasploitable  Web  Server 


Windows  XP  Windows  2003  Server 


Figure  2.  Single  or  Dual  User  Exercise  Topology 


Scenario  Design 

Offensive  Steps: 

1 .  Scan  Network 

2.  Exploit  Web  Server 

3.  Client  Side  Exploit  of  Windows  XP  machine 

4.  Hashdump/  Password  crack  to  get  Domain  credentials 

5.  Move  laterally  to  internal  Windows  server 

6.  Exfiltrate  data 

7.  Cover  Tracks 
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Defensive  Functionality  Requirements: 

1 .  Keep  all  maehines  up 

2.  Freeze  IDS  and  AV  signature  updates 

3.  Cannot  block  subnets 

Hardware/Software  Used 

I  used  three  physical  computers,  mostly  due  to  storage  and  performance  concerns. 
I  connected  them  with  a  hub  and  CAT5  Ethernet  cables,  and  kept  them  air-gapped  from 
any  other  network  connections. 

Hardware:  Dell  Laptop  running  Host  OS  of  Kali  Linux  (Offensive  Security  Ltd., 

2014)2 

Dell  Laptop  running  Virtualized  windows  environment  (Internal  network) 
o  Windows  Server  (DNS/  Active  Directory/  Email) 
o  Windows  Host 

Dell  laptop  running  virtualized  Linux  environment  (DMZ) 
o  Metasploitable  (Rapid?,  2012)  (web  server) 
o  Security  Onion  (IDS) 

Ethernet  hub  with  4  network  interfaces 
3  CAT5  Ethernet  cables 

Defensive  Tools: 

Intrusion  Detection  System  -  Security  Onion 

Network  Traffic:  Wireshark 

Host  Anti  Virus — Avast  Lree  AV 

Host  Event  Logs — Windows  Events,  Linux  var/log 
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Offensive  Tools: 


Main  Interface;  Armitage/  Metasploit 

Exploit  Package  Development:  Social  Engineering  Toolkit 

Scanner:  Nmap 

Password  Cracker;  John  the  Ripper  (Johnny) 


B.  PROCESS/  RESULTS 

Using  the  Kali  Linux  machine,  I  began  a  Metasploit  database  and  started 
Armitage.  This  allowed  for  a  graphic  depiction  of  the  attack,  as  well  as  a  good  place  to 
consolidate  scans,  sessions,  and  data  all  in  one  place,  as  shown  in  Eigure  3. 


EigureS.  Mapping  the  Network 
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Phase  1:  Scanning: 

First  step  was  to  scan  the  /24  network  housing  the  Webserver  and  other  machines 
in  the  network.  For  this  I  used  Nmap  within  Armitage. 

Enumeration;  Upon  scanning  the  Metasploitable  web  server,  I  (as  expected)  found 
several  vulnerable  programs.  Armitage  allows  for  a  “Find  Attacks”  option,  which 
examines  the  scan  information  for  possible  exploits. 

Note:  The  exploitation  of  a  Metasploitable  server  is  neither  difficult  nor  unique  in 
nature.  The  use  of  this  well-known  vulnerable  VM  was  to  facilitate  the  exercise,  not  to 
demonstrate  offensive  expertise.  Unfortunately,  real-world  anecdotes  often  reveal 
systems  similarly  vulnerable,  so  it  is  not  out  of  the  question  that  such  a  server  might  be 
running  in  the  wild. 

Phase  2:  Exploitation: 

Using  Armitage,  I  then  executed  a  “Hail  Mary”  against  the  server,  to  see  what 
exploits  would  work  and  which  sessions  I  could  begin.  Sure  enough,  4  sessions  opened. 


Figure  4. 


Sessions  Started  on  Web  Server 


with  varying  privileges,  as  seen  in  Figure  4. 

Lateral  Movement:  Once  a  session  was  established  with  the  Metasploitable  server, 
I  sought  to  make  my  next  move  in  the  methodology — move  laterally  to  the  windows 
domain.  To  do  this,  I  created  an  embedded  .pdf  executable  using  the  Social  Engineering 
Toolkit  (SET):  Once  the  executable  was  constructed,  I  used  a  session  from  Metasploit  to 
upload  the  executable  to  the  var/www/  directory  of  the  Metasploitable  web  server,  as 
shown  in  Figure  5. 

Client-Side  Exploitation  (Maintain  Access):  The  next  step  assumes  a  social 
engineering  success  to  have  someone  in  the  Windows  domain  access  the  pdf  on  the 
Metasploitable  web  server.  I  felt  comfortable  making  this  assumption  because  this  is  not 
only  a  common  social  engineering  tactic,  but  in  this  scenario  the  web  server  and 
Windows  hosts  were  part  of  the  same  organization. 


Applkationt  ^  [O 


$gnM«y25.  6:21AM 


Armit«9« 


f  Cont»*>  X  ]  nrrtap  X  |  nmap  X  )  MM  Mary  x  |  HtOfpftf  2  X  | 


ttcrertttr  > 

/var/MM 

tuntctitr  >  iv 

Listing:  /var/www 

N»d« 

Si/e 

Tjrpr 

Last  Baditlfd 

Hme 

40666/rw>rw>rv(> 

4096 

41r 

2012-0S-20  lSi)9i29  -0400 

dav 

40666/rv*rw>r«* 

4096 

dir 

2012-0^-20  1^1^133  -  0400 

dvwa 

l00666/ncrv«rw« 

891 

111 

2012*06*20  I%i31i37  *0400 

Indri.php 

40666/rw>rw>rv(> 

4096 

dir 

2012*<»>14  eii43iM  -0400 

■uUlUdar 

40M6/rw*rw>rw* 

4096 

dir 

2012-06-14  eii36i40  -0400 

phpItyAOaln 

100666/nt-  rv-  n- 

19 

til 

2010-04-16  02il2i44  -0400 

phplnfo.php 

1006b6/rw>  rw>  rw* 

60618 

111 

2014-06-24  Ili34il0  -0400 

t«a8lat«2.pd( 

40666/n(*rw>rv* 

4096 

dir 

2012-06-14  01i60i38  -0400 

test 

40666/rw*rw*rv* 

20480 

dir 

2010-04-19  18i64il6  -0400 

Uklwlkl 

40666/rw>rw>rw- 

20480 

dir 

2010-04-16  02il7i47  -0400 

Uklwlkl-ald 

40666/rw<rw>rvr< 

4096 

dir 

2010-04-16  16i27i68  -0400 

twiki 

Q  (rootOl^'l  *] 

n  [fooi^katil  *]  11}  [foot^katil  *] 

■1  Armta^a 

lO 

Figures.  Uploading  the  Exploit  PDF 
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On  the  Kali  Linux  machine,  I  began  a  listener  on  port  443.  I  coded  this  port  as  a 
callback  on  the  pdf  executable  because  it  is  commonly  allowed  out  on  most  internal 
networks.  Once  the  pdf  was  downloaded  by  a  victim  on  the  web  server,  it  immediately 
executes  and  calls  back  to  the  attacking  computer.  Now  that  I  had  access  to  the  windows 
domain,  I  performed  an  “arp  -a”  command  on  the  XP  host  to  see  what  other  computers  it 
was  talking  to.  The  arp  sweep  revealed  the  Windows  server. 

Moving  Laterally:  Additionally,  while  on  the  XP  machine,  I  performed  a  dump  of 
all  the  stored  credentials,  using  the  “hashdump”  meterpreter  command.  Once  I  had  the 
hashes,  I  set  about  cracking  them  using  John  the  Ripper  with  the  default  wordlist  that 
came  with  the  Kali  Linux  build.  Once  the  Administrator  password  had  been  cracked,  I 
sought  to  map  to  a  share  within  the  Windows  domain  to  simulate  lateral  movement.  I 
used  the  net  view/  net  use  commands,  authenticating  to  the  Windows  server  as 
Administrator.  This  is  shown  in  Figure  6. 

Phase  3:  Exfiltration: 

Once  I  had  successfully  mapped  the  drive,  I  could  explore  the  fde  system  through 
meterpreter,  and  download  the  secret  fde,  as  shown  in  Figure  7. 

Cover  Tracks:  Finally,  to  cover  my  tracks,  I  used  the  “clear  event  log”  function 
within  the  meterpreter  shell,  on  the  XP  host.  Additionally,  I  wrote  ““  to  the  var/log/syslog 
fde  on  the  Metasploitable  server.  I  was  unable  to  wipe  logs  on  the  Windows  server, 
however,  since  I  did  not  have  shell  access  but  rather  simply  mapped  the  drive. 
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Figure  6.  Mapping  Network  Drive 


ApplIcMions  Pl*c«s  f  Q 


Sun  May  25. 10:21AM 


Amita^a 


^mta^a  ^taw  yout  Attacks  £»rktpacai  ya^P 


Figure  7.  View/  Exfiltrate  Sensitive  Information 
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MONITORING/  GATHERING  STATISTICS 


On  the  blue  team  side,  there  were  several  plaees  to  gather  both  live  and  after-the- 
fact  statistics  on  the  attack.  I  wanted  to  be  sure  that  these  tools  would  indeed  gather  alerts, 
and  also  examine  the  data  to  indeed  see  if  the  statistics  were  viable  and  sensible. 

Network  Technology: 

The  security  onion  VM  served  as  a  useful,  if  somewhat  finicky,  IDS  alert 
platform.  Security  Onion  provides  a  wealth  of  different  IDS  and  PCAP  technologies;  I 
chose  to  use  SQUIL  in  this  instance,  as  shown  in  Figure  8. 


Figure  8.  IDS  Alerts 
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Network  IDS  Technology  Evaluation: 

The  following  calculations  were  performed  to  calculate  the  Detection  Impact  of 
the  IDS  technology.  There  were  97  total  alerts  from  scanning,  revealing  the  attacker  IP 
address.  Additionally,  a  total  of  6  alerts  fired  on  exploitation,  covering  2  of  the  4  exploits 
launched. 


IP  Address  1 

Exploit  1 

Exploit  2 

Total 

True  Positive 

97 

4 

2 

101 

Total  Alerts 

174 

174 

174 

174 

Detection  Rate 

0.56 

0.02 

0.01 

0.58 

Indicator 

1 

1 

1 

Total  Infrastructure 

256 

4 

4 

Weighting  Factor 

0.0039 

0.2500 

0.2500 

0.1680 

Detection  Impact 

0.2% 

0.6% 

0.3% 

9.7% 

Table  8.  Calculation  of  IDS  Detection  Impact 


Host-  Based  Technology  Evaluation: 

I  installed  Avast!  Free  Anti-Virus  on  the  XP  host.  While  it  alerted  on  the  initial 
pdf  exploit  and  another  malicious  .pdf  fide  I  uploaded,  (as  seen  in  Figure  9),  it  did  not 
alert  on  a  persistence  agent  uploaded  after  a  session  had  been  established.  This  speaks  to 
the  low  false  positive  rate  of  Anti-Virus  solution — ^but  also  the  false  negative  rate.  In  fact, 
Symantec  recently  reported  that  they  estimate  their  AV  solution  only  catches  45%  of 
current  cyber  attacks  (Yadron,  2014).  In  comparison  to  the  IDS,  the  Detection  Impact 
statistic  accounts  for  the  false  negative  rate,  even  though  the  Signal/Noise  ratio  was 
perfect. 
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Figure  9.  Avast!  AV  Alerts 


Artifact  1  Artifact  2  Artifacts  Total 
True  Positive  4  3  0  4 

Total  Alerts  7  7  7  7 

Detection  Rate  0.57  0.43  0.00  0.57 


Unique  Indicator  111 

Total  Infrastructure  Type  3  3  3 

Weighting  Factor_ 0.3333_ 0.3333_ 0.3333  0.3333 

Detection  Impact  19.0%  14.3%  0.0%  33.3% 

Table  9.  Calculation  of  Avast  Detection  Impact 

To  elaborate  on  this  metric,  the  reason  why  the  AV  solution  had  a  higher 
Detection  Impact  is  primarily  because  had  AV  alerted  on  all  three  artifacts,  and  assuming 
rapid  or  immediate  mitigation  had  been  taken  against  them,  I  as  the  adversary  would  have 
been  effectively  blocked  from  my  mission. 

Summary: 

The  Detection  Impact  metric  was  intended  to  provide  a  platform-agnostic  measure 
of  a  technology’s  ability  to  provide  a  blue  team  with  a  full,  clear  picture  of  malicious 
actor’s  presence  on  the  network.  While  the  metric  may  not  account  for  technological 
externalities  or  extreme  cases,  it  can  be  used  to  compare  detection  technologies  at  the 
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network,  domain,  and  host  levels.  If  the  aggregate  seore  of  a  seeurity  staek  approaehes  1, 
then  the  Blue  Team  is  in  a  position  to  effeetively  bloek  the  aetors  from  sueeess  in  their 
mission.  In  the  examples  of  the  Single-User  Exereise,  the  Deteetion  Impaet  metric  did 
indeed  land  between  0  and  1  for  both  IDS  and  AV,  and  provided  comparable  values  of 
9.7%  and  33.3%,  respectively. 

D,  EXTENSIONS  TO  CLASSROOM  EXERCISE 

Building  off  the  basics  of  the  small  exercise,  a  classroom  exercise  can  be  built 
with  similar  principles,  with  more  machines,  more  technologies,  and  thus  more 
opportunities  for  evaluation.  Instead  of  single  machines  representing  the  Attacker,  DMZ, 
and  Internal  Network,  more  machines  can  be  placed  into  segments,  better  simulating  a 
real  enterprise  network.  An  example  of  what  a  classroom  exercise  might  look  like  is  in 
Figure  10. 
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Overview: 


Figure  10.  Classroom  Exercise  Topology 


Essentially,  the  exercise  should  preserve  the  concepts  and  methodology  of  initial 
compromise,  privilege  escalation,  lateral  movement,  and  exfiltration.  This  methodology 
resembles  real-world  activity,  and  will  also  help  make  sense  of  the  statistical  data 
gathered.  If  the  exercise  were  extended  to  the  classroom  environment  and  with  a  larger 
network,  the  methodology  could  look  like  the  below: 

1.  Scan  Network 

a.  Can  scan  the  whole  network,  or  just  DMZ.  Additionally, 
Red  Team  can  switch/spoof  IPs 

2.  Exploit  Web  Server 
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a.  There  should  be  several  servers  in  the  DMZ.  Web  server 
could  form  an  initial  point  of  entry,  but  DNS  and/or  Email 
could  also  be  leveraged. 

3.  Client  Side  Exploit  of  Windows  XP  machine 

a.  This  could  take  several  forms.  In  my  example,  I  pushed  a 
malicious  file  to  the  Webserver — ^but  a  spear-phishing 
attack  or  other  abuse  of  trusted  relationships  between  the 
DMZ  and  internal  network  could  be  used. 

4.  Hashdump/  Password  crack  to  get  Domain  credentials 

a.  This  could  entail  compromise  of  a  domain  controller,  rather 
than  simple  password  cracking;  however,  cracking  hashes 
or  pass-the-hash  remain  relatively  simple  ways  to  achieve 
the  same  goal. 

5.  Move  laterally  to  internal  Windows  server 

a.  Since  there  are  several  servers,  this  can  be  expanded  to 
include  a  file  server  or  database. 

6.  Exfiltrate  data 

a.  Rather  than  a  direct  download  through  a  shell,  a  classroom 
exercise  could  incorporate  an  internal  blue  network  ETP 
server  as  a  pivot  point. 

7.  Cover  Tracks — This  could  potentially  involve  data  deletion,  or 
centralized  logging  corruption — depending  on  the  scope  of  the 
exercise. 

Timing: 

The  three  phased  approach  in  the  single/dual  user  exercise  worked  well,  allowing 
me  to  perform  Red  Team  functions,  stop,  and  then  examine  the  Blue  Team  detection 
tools  for  alerts  and  information.  Therefore,  the  proposed  timing  of  the  classroom  exercise 
is  to  break  the  exercise  into  three  phases  as  well.  However,  these  phases  are  further  split 

44 


into  “innings,”  alternating  activity  between  teams  in  order  to  objectively  measure  the 
effectiveness  of  the  team,  tools,  and  tactics. 


Phase  1 

Phase  2 

Phase  3 

Offense 

Scan 

Gain  Privileges 

Complete  Mission 

Enumerate/  Exploit 

Move  Laterally 

Cover  Tracks 

Submit  Vulnerable  Hosts  and 
Services,  along  with 
Infrastructure  used,  to  White 
Cell 

Submit  Internal  Network 

Infrastructure  to  White  Cell 

Submit  Exfiltrated 

Info  to  White  Cell 

Defense 

Examine  IDS  Alerts  and  FW 
Logs 

Examine  AV and  other 
Network/Host  Technologies 

Examine  Forensic 
Images 

Submit  IP  and  signature 
Indicators  to  White  Cell 

Submit  Artifacts  and 

Accounts  to  White  Cell 

Submit  Exfiltrated 

Info  and  Evidence 

to  White  Cell 

Table  10.  Classroom  Exercise  Phases 


Hacking  Back: 

For  the  purposes  of  this  exercise,  “hacking  back,”  that  is,  offensive  activities 
undertaken  by  the  blue  team,  are  not  allowed.  The  reason  in  this  case  is  that  in  a  “hack 
back”  scenario,  the  offensive  blue  team’s  objective  is  not  data  exfiltration  but  rather 
destruction  and  disruption.  While  in  a  wartime  scenario,  this  may  apply,  in  economic 
espionage  cases  it  represents  a  fundamentally  separate  paradigm. 


Additional  Modules  to  Consider: 

Security  Team  Under  Surveillance;  In  Phase  2,  the  Red  team  gains  visibility  on 
the  security  devices  and  actions  of  the  Blue  team. 

Destructive  Payload:  In  Phase  3,  the  red  team  destroys  or  deletes  data  as  a  final 
step,  rather  than  exfiltration.  It  is  important  for  assessment  purposes  that  records  be 
copied  by  the  white  team  before  this  action,  in  order  to  ensure  historical  data  is  not 
deleted  as  part  of  the  exercise  itself. 

DDoS  as  a  distraction:  During  any  of  the  phases,  DDoS  could  be  used  by  the  Red 
Team  to  distract  or  disrupt  Blue  Team  operations  in  the  midst  of  the  exercise. 
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V.  CONCLUSIONS  AND  FUTURE  WORK 


A,  LESSONS  LEARNED 

Executing  the  single-user/  dual  user  exercise,  and  recording  statistics  while  doing 
so,  illuminated  many  lessons  about  how  statistics  might  be  gathered  in  a  classroom 
setting.  Additionally,  the  various  different  iterations  of  systems  that  needed  to  be  created, 
destroyed,  and  created  again  made  it  clear  that  these  statistics  should  be  general,  not 
specific  to  any  given  technology.  They  must  also  be  specific  enough  to  highlight  attacker 
and  defensive  weaknesses  at  every  stage  of  the  exercise. 

Given  the  technical  effort  involved  in  full  statistical  capture  for  the  single-user 
exercise,  the  classroom  exercise  could  become  prohibitively  complicated  if  not  properly 
scoped  and  planned.  In  order  to  gather  meaningful  statistics,  there  must  be  a  solid 
understanding  of  security  device  configuration,  visibility,  and  output  in  order  to 
objectively  measure  its  effectiveness.  Additionally,  while  maintaining  an  authoritative 
timeline  and  a  compendium  of  statistics  is  a  hard  task,  it  must  be  a  priority  for  the  white 
cell. 

Cyber  defense  technologies  often  involve  complicated  and  obscure  commands, 
interfaces,  and  terms.  While  it  may  be  easy  to  blame  the  opaqueness  of  a  tool’s  output  on 
the  user,  there  is  something  to  be  said  for  ease  of  usability  and  interpretation.  If  a  tool 
requires  months  or  years  of  training,  then  the  investment  for  that  tool  is  far  higher  than 
the  sticker  price.  Therefore,  while  missed  alerts  or  indicators  may  technically  be  due  to  a 
blue  team’s  lack  of  skill,  tools  which  require  extensive  training  will  be  scored  lower  than 
easy-to-use  tools. 

B,  FUTURE  WORK 

While  measuring  a  technology’s  ability  to  detect  actor  tools  and  infrastructure  is  a 
fair  measure,  that  knowledge  is  no  good  if  a  blue  team  cannot  remove  malicious  access  in 
time.  Statistics  must  be  tested  to  measure  the  time  it  takes  for  a  blue  team  to  translate 
alert  data  into  indicators,  and  indicators  into  mitigations. 
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Time-Based  Statistics: 


Theory:  If  a  mitigation  action  is  taken  to  remove  aetor  aceess  after  the  actor  has 
already  aehieved  their  mission,  there  is  little  point.  The  important  thing  from  a  cyber 
defense  perspective  is  to  operate  inside  the  operation  loop  of  the  adversary.  Thus,  the 
grand  measure  of  suecess  or  failure  is  whether  or  not  alerts  yielded  indieators,  and 
indieators  preeipitated  mitigations,  fast  enough  to  affeet  the  adversary’s  goals. 

Implementation:  Each  phase  of  the  exercise  is  evaluated  on  time  of  first  alert,  to 
time  of  first  indicator  submitted,  to  time  of  first  mitigation  put  in  plaee.  This  measures 
people  and  proeesses  ability  to  analyze  alert  information,  gain  confidenee  in  indieators, 
and  then  implement  mitigations.  Thus,  the  time  from  first  alert  to  mitigation,  as  eompared 
to  first  eontact  to  mission  completion,  is  the  measure  of  the  effeetiveness  of  a  proeess  or 
personnel  during  an  exereise. 

The  below  statisties  could  potentially  be  used  to  measure  the  team’s  timely  or 
untimely  performanee — measuring  reaetion  to  information,  ability  to  adapt,  quality  of 
deteetion,  and  other  qualities  whieh  apply  to  the  human  components  of  red  and  blue 
teams.  These  metries  require  testing  and  validation,  as  part  of  future  projects. 


Measures 

Data  Type 

Time  of  First  Contact— White  Cell 

Time 

Time  of  First  Alert  (IDS  or  AV) 

Time 

Time  of  First  Compromise 

Time 

Time  of  First  Exfiltration 

Time 

Time  of  Final  Packet 

Time 

Statistics 

Data  Type 

Time  until  Network  Compromised  (First  Entry) 

Duration 

Time  between  First  Contact  and  First  Alert 

Duration 

Time  between  First  Contact  and  Compromise 

Duration 

Time  between  Compromise  and  Exfiltration 

Duration 

Time  until  Data  Exfiltrated 

Duration 

Time  Between  First  Alert  and  Exfiltration 

Duration 

Table  11.  Time-Based  Statistics 
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Mitigation-  Based  Statistics: 

It  is  imperative  that  the  blue  team  keep  traek  of  what  mitigations  are  put  in  place, 
and  when.  This  is  critical  to  recognizing  which  mitigations  had  the  most  effect  on  the 
actors’  ability  to  exfiltrate  data. 


Metric 

Data  Type 

Network  Mitigation: 

(Time,  indicator) 

Artifact  Mitigation: 

(Time,  indicator) 

Account  Mitigation: 

(Time,  indicator) 

Table  12.  Mitigation  Statistics 


For  each  mitigation  (Indicator  Submitter,  Layer  of  Mitigation,  Time  of  Submission),  the 
timeliness  can  be  viewed  using  the  following  approach 


Mitigation  Timeliness:  For  Each  Mitigations  M  which  is  informed  by  Alerts  Ai, 
A2,  A3,  etc  containing  Indicators  L,  I2, 13,  etc: 


Average  [(Time  from  First  Alert  Al  to  Mitigation  Ml — (Time  from  First  Contact  to 
Exfiltration)]  /  (Time  from  First  Contact  to  Exfiltration) 

Equation  6.  Mitigation  Timeliness 


This  concept  is  portrayed  graphically  in  Eigure  11. 
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Figure  1 1 .  Mitigation  Timeline 

Rather  than  simply  evaluating  teehnology,  these  metrics  can  be  used  to  evaluate 
processes  and  technology — namely,  room  for  improvement.  That  can  justify  a  need  to 
devote  resources  to  procedure  development  and  practice,  or  training  for  Incident 
Response  Personnel  to  more  quickly  turn  alerts  to  indicators  and  indicators  to  mitigations. 
While  not  as  discrete  as  the  metrics  for  technology  implementation,  they  can  provide  a 
good  measure  of  which  processes  are  most  important  to  focus  on,  or  which  areas  of  a 
team  or  procedure  need  improvement. 

C.  CONCLUSIONS 

In  order  for  the  concept  of  exercises  as  sources  of  statistics  to  help  cyber 
modeling  efforts,  cyber  exercises  need  to  adopt  more  robust  statistical  gathering 
mechanisms,  and  share  them  openly  with  the  community.  This  paper  has  demonstrated 
the  need  for  and  means  by  which  these  statistics  can  be  generated  and  integrated  into 
existing  models.  These  are  first  attempts;  it  is  my  hope  that  better  and  more  elegant 
manners  of  translation  are  built  and  executed. 

Secondly,  it  is  imperative  that  there  be  a  clearinghouse  of  these  statistics,  and  that 

they  be  made  public  for  everyone’s  consumption.  These  statistics  will  gain  accuracy  as 

the  sample  size  grows.  It  is  possible,  for  example,  to  publish  the  single  user  exercise  and 

let  people  submit  their  own  statistics  with  verification  from  a  third  party.  If  the  data  is 

crowd-sourced  in  that  manner,  it  could  gain  traction  in  the  community  and  help  to  build  a 
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much  larger  data  set  of  attaek  and  defense  stats.  Additionally,  if  elassroom  Red/Blue 
exereises  traek  statisties  properly,  that  data  ean  be  ineorporated  into  the  larger  eommunity. 
If  properly  indexed  and  eategorized,  this  data  eould  finally  solve  the  laek  of  operational 
data  that  eyber  seeurity  researeh  suffers  from. 

Finally,  eyber  seeurity  researehers  must  use  this  data  to  build  and  test  eost  and 
investment  models.  If  these  models  ean  be  developed,  tested,  and  used  to  justify  real 
world  deeisions. 
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